SQL Server 2005 : Data Querying and Reporting (part 2)

10/17/2010 5:49:51 PM

Querying a Sampling of the Data Stored

In some cases, you don’t want to run a query against all the data in a table. In such a situation, you can use the new TABLESAMPLE clause to limit the number of rows that any query processes. Unlike TOP, which returns only the first rows from a result set, TABLESAMPLE returns rows selected randomly by the system from throughout the set of rows processed by the query.

Note

TABLESAMPLE requires a sufficient amount of data to operate properly. In small tables, where all the data fits onto a single data page, the only possible returns are all (100%) and none (0%).

You can specify the conditions of the random selection by percentage or by number of rows, as in the two following examples:

SELECT * FROM Customers TABLESAMPLE(10 PERCENT)
SELECT * FROM Customers TABLESAMPLE(10 ROWS)

Selecting Rows Based on NULL Values

A NULL value is a value given to a field that that has no value. Many people confuse NULL values with zero-length strings or the value zero, but they are not the same. NULL is basically a fancy word for a value that is unknown. In SQL Server, you can select the desired NULL values or reject them by using ISNULL, as shown in the following query:

SELECT * FROM ONE.dbo.Customers
    WHERE ISNULL(StreetAddress, '#') = '#'

The ISNULL function operates on the basis of substitution. In the previous example, if a value for StreetAddress is not known for a particular customer, then within the query, it is replaced by and treated as the ‘#’ character, which matches the condition of the WHERE clause and returns customers with NULL addresses. NULLJOIN operations that formulate derived tables. values frequently show up as the result of

Relating Data from Multiple Tables

Joins and derived tables figure prominently in the 70-431 exam. Joins are the backbone of relational databases; they actually put the “relation” in relational databases. They can be used in all the main SQL statements (SELECT, INSERT, UPDATE, and DELETE). They are important. Also, derived tables tend to be overlooked, and they’re perceived as complicated even though they’re not; so they are also likely to show up on the 70-431 exam.

Joining tables is a natural occurrence in a relational database system. Many of the queries performed on a regular basis involve multiple tables. Whenever you query data from two tables, you need to find some way to relate the two tables. Connecting one table to another requires a common element of the tables. You would use a JOIN operation in this manner whenever you want to see a result set that includes columns from several tables.

You can use three basic join types, as well as a union operation, to connect tables:

Inner join— An inner join shows results only where there are matches between the elements. An inner join leaves out all the records that don’t have matches.
Outer join —An outer join can show all the records from one side of a relationship, records that match where they are available, and NULL values for records that do not have matches. An outer join shows all the same records as an inner join, plus all the records that don’t match.
Cross join— The cross join is less often used than the other two types of joins because it returns all possible combinations of rows between the two sides of the join. The number of records in the result set is equal to the number of records on one side of the join multiplied by the number of records on the other side of the join. No correlation is attempted between the two records; all the records from both sides are returned.

Again, cross joins are less frequently used than inner and outer joins. With an outer join, you are guaranteed to have all records returned from one side or the other. With inner joins, the only rows returned are those that match in the joined tables. It is easier to contemplate the overall processes if you consider join operations first. What you put in the WHERE clause and other clauses is applied after the joins are processed. So bear in mind that when a join returns a specified set of records, the SQL statement may or may not return all those records, depending on what you have specified in the WHERE clause.

Now let’s look at each of the different join operators: INNER JOIN and OUTER JOIN.

Outputting Only Matches: INNER JOIN

The INNER JOIN statement is the easiest and most often used join operator. For this reason, when an inner join operation is performed, the INNER portion of the syntax is optional. A rowset is returned from the operation for the records that match up based on the criteria provided through the ON clause. A one-to-many relationship is handled inside an application with the inner join. To show all orders for a particular customer, you could use the following join operation:

SELECT Orders.* FROM Orders
 JOIN Customers
  ON Orders.CustomerID = Customers.CustomerID

The results from this query may appear a little on the unusual side because no sorting is performed. Typically, an order is specified or information is grouped to make the output more usable.

Returning Output Even When No Match Exists: OUTER JOIN

You can use an outer join when you want to return all of one entire list of rows from one side of the join. There are three types of outer joins: left, right, and full. The terms left and right are used based on the positioning of the tables within the query. The first table is the right; the second is the left. You may find it easiest to draw a picture to represent the table when you are first learning outer joins. A right outer join, often abbreviated as right join, returns all the rows belonging to the table on the right side and only the matching rows on the table on the left side. Conversely, a left outer join returns all the rows from the table on the left side. A full outer join returns all the rows from both sides that have correlations.

Left and right outer joins, for all intents and purposes, are the same operations; they are simply a matter of the position of the tables within the queries. Therefore, the following examples use only the left outer join syntax, which is the one typically used. In the current ANSI standard, RIGHT is the table name on the right side of the JOIN keyword, and LEFT is the table being joined. The following query produces a listing of all customers and their orders:

SELECT * FROM Customers AS C
 LEFT JOIN Orders AS O
 ON C.CustomerID = O.CustomerID

Customers that have never placed an order would still be in the listing, accompanied by NULL for the OrderID.

A cross join, also known as a Cartesian join, is rarely used because it connects every element in one table with every other element of another table. This has some bearing in statistical operations but is not relevant in most business processing.

Applying Conditional Data Filtering

You apply filtering to data to determine the data to be selected based on conditional requirements. Essentially, all conditions come down to one of three possible outcomes. If two values are compared, the result is positive, negative, or equal (greater than, less than, or equal to). Actually, filters always evaluate to a Boolean result, either True or False. BETWEEN 10 and 20 is either True or False. Even numeric tests, such as month > 0 and Price > 10.00, are either True or False.

BETWEEN, IN, and LIKE

With the help of the BETWEEN keyword, you can specify ranges when using the WHERE clause. Simply put, BETWEEN provides a range of values within which the data should lie; otherwise, the data does not meet the condition. BETWEEN is inclusive, meaning that the range includes the lower value specified and the upper value specified. For example, the following query would have the values 10 and 20 as a possibility in the results:

SELECT * FROM Products WHERE UnitPrice BETWEEN 10 AND 20

If the intent is to exclude the value 20, the query would be written like this:

SELECT * FROM Products WHERE UnitPrice BETWEEN 10.00 AND 19.99

You can also incorporate something known as a list when using the WHERE clause. Essentially, a list specifies the exact values a column may or may not take. If the record does not contain the value for the column specified in the IN list, it is not selected. IN determines whether a given value matches a set of values listed. Here is an example:

SELECT * FROM Customers WHERE Country IN ('UK', 'USA')

This example limits the values of Country to only UK and USA. Customers who live in the countries mentioned in the IN list are the only ones listed.

You can retrieve rows that are based on portions of character strings by using the LIKE predicate. The LIKE predicate determines whether a given character string matches a specified pattern. The data types a LIKE statement can work with are char, varchar, nvarchar, nchar, datetime, smalldatetime, and text. A pattern specified in the LIKE predicate can include regular characters and wildcard characters. During pattern matching, regular characters must exactly match the characters specified in the column value. Wildcard characters can be matched with any character or set of characters, according to the wildcard character used, as shown in Table 1.

Table 1. Wildcard Characters Allowed in T-SQL
Character	Meaning
[]	Any single character within the specified range (for example, [f-j]) or set (for example, [fghij])
_ (underscore)	Any single character
%	Any number of zero or more characters
[ ^ ]	Any single character not in the specified range or set

If an application repeatedly calls the LIKE predicate and performs numerous wildcard searches, you should consider using the MS Search facility if it is installed and in use on the server. Consider the value of the response time over the storage resources that the MS Search service and full-text search capabilities require. MS Search service is required to use a full-text search. Full-text searching enables a variety of powerful wildcard searches. You should avoid LIKE searches that have a % wildcard at both the beginning and the end. The following example shows how the LIKE clause uses the % wildcard to select all customers whose CustomerIDA: begins with the letter

SELECT CustomerID, ContactName FROM Customers
 WHERE CustomerID LIKE 'A%'

You can also use the NOT keyword with the LIKE predicate to simply retrieve a query that does not contain records matching the specified elements in the LIKENOT. clause. With character matching, it is sometimes more efficient to exclude characters by using